12 research outputs found

    Modulo scheduling with integrated register spilling for clustered VLIW architectures

    Get PDF
    Clustering is a technique to decentralize the design of future wide issue VLIW cores and enable them to meet the technology constraints in terms of cycle time, area and power dissipation. In a clustered design, registers and functional units are grouped in clusters so that new instructions are needed to move data between them. New aggressive instruction scheduling techniques are required to minimize the negative effect of resource clustering and delays in moving data around. In this paper we present a novel software pipelining technique that performs instruction scheduling with reduced register requirements, register allocation, register spilling and inter-cluster communication in a single step. The algorithm uses limited backtracking to reconsider previously taken decisions. This backtracking provides the algorithm with additional possibilities for obtaining high throughput schedules with low spill code requirements for clustered architectures. We show that the proposed approach outperforms previously proposed techniques and that it is very scalable independently of the number of clusters, the number of communication buses and communication latency. The paper also includes an exploration of some parameters in the design of future clustered VLIW cores.Peer ReviewedPostprint (published version

    Hierarchical clustered register file organization for VLIW processors

    Get PDF
    Technology projections indicate that wire delays will become one of the biggest constraints in future microprocessor designs. To avoid long wire delays and therefore long cycle times, processor cores must be partitioned into components so that most of the communication is done locally. In this paper, we propose a novel register file organization for VLIW cores that combines clustering with a hierarchical register file organization. Functional units are organized in clusters, each one with a local first level register file. The local register files are connected to a global second level register file, which provides access to memory. All intercluster communications are done through the second level register file. This paper also proposes MIRS-HC, a novel modulo scheduling technique that simultaneously performs instruction scheduling, cluster selection, inserts communication operations, performs register allocation and spill insertion for the proposed organization. The results show that although more cycles are required to execute applications, the execution time is reduced due to a shorter cycle time. In addition, the combination of clustering and hierarchy provides a larger design exploration space that trades-off performance and technology requirements.Peer ReviewedPostprint (published version

    UKWMO

    No full text
    SIGLEAvailable from British Library Lending Division - LD:GPB-5354 / BLDSC - British Library Document Supply CentreGBUnited Kingdo

    Software and hardware techniques to optimize register file utilization in VLIW

    No full text
    High-performance microprocessors are currently designed with the purpose of exploiting instruction level parallelism (ILP). The techniques used in their design and the aggressive scheduling techniques used to exploit this ILP tend to increase the register requirements of the loops. This paper reviews hardware and software techniques that alleviate the high register demands of aggressive scheduling heuristics on VLIW cores. From the software point of view, instruction scheduling can stretch lifetimes and reduce the register pressure. If more registers than those available in the architecture are required, some actions (such as the injection of spill code) have to be applied to reduce this pressure, at the expense of some performance degradation. From the hardware point of view, this degradation could be reduced if a high-capacity register file were included without causing a negative impact on the design of the processor (cycle time, area and power dissipation). Novel organizations for the register file based on clustering and hierarchical organization are necessary to meet the technology constraints. This paper proposes the used of a clustered organization and proposes an aggressive instruction scheduling technique that minimizes the negative effect of the limitations imposed by the register file organization.Peer Reviewe

    Mirs: modulo scheduling with integrated register spilling

    No full text
    The overlapping of loop iterations in software pipelining techniques imposes high register requirements. The schedule for a loop is valid if it requires at most the number of registers available in the target architecture. Otherwise its register requirements have to be reduced by spilling registers to memory. Previous proposals for spilling in software pipelined loops require a two-step process. The first step performs the actual instruction scheduling without register constraints. The second step adds (if required) spill code and reschedules the modified loop. The process is repeated until a valid schedule, requiring no more registers than those available, is found. The paper presents MIRS (Modulo scheduling with Integrated Register Spilling), a novel register-constrained modulo scheduler that performs modulo scheduling and register spilling simultaneously in a single step. The algorithm is iterative and uses backtracking to undo previous scheduling decisions whenever resource or dependence conflicts appear. MIRS is compared against a state-of-the-art two-step approach already described in the literature. For this purpose, a workbench composed of a large set of loops from the Perfect Club and a set of processor configurations are used. On the average, for the loops that require spill code a speed-up in the range 14–31% and a reduction of the memory traffic by a factor in the range 0.90–0.72 are achieved.Peer ReviewedPostprint (published version

    Hierarchical clustered register file organization for VLIW processors

    No full text
    Technology projections indicate that wire delays will become one of the biggest constraints in future microprocessor designs. To avoid long wire delays and therefore long cycle times, processor cores must be partitioned into components so that most of the communication is done locally. In this paper, we propose a novel register file organization for VLIW cores that combines clustering with a hierarchical register file organization. Functional units are organized in clusters, each one with a local first level register file. The local register files are connected to a global second level register file, which provides access to memory. All intercluster communications are done through the second level register file. This paper also proposes MIRS-HC, a novel modulo scheduling technique that simultaneously performs instruction scheduling, cluster selection, inserts communication operations, performs register allocation and spill insertion for the proposed organization. The results show that although more cycles are required to execute applications, the execution time is reduced due to a shorter cycle time. In addition, the combination of clustering and hierarchy provides a larger design exploration space that trades-off performance and technology requirements.Peer Reviewe

    Modulo scheduling with integrated register spilling for clustered VLIW architectures

    No full text
    Clustering is a technique to decentralize the design of future wide issue VLIW cores and enable them to meet the technology constraints in terms of cycle time, area and power dissipation. In a clustered design, registers and functional units are grouped in clusters so that new instructions are needed to move data between them. New aggressive instruction scheduling techniques are required to minimize the negative effect of resource clustering and delays in moving data around. In this paper we present a novel software pipelining technique that performs instruction scheduling with reduced register requirements, register allocation, register spilling and inter-cluster communication in a single step. The algorithm uses limited backtracking to reconsider previously taken decisions. This backtracking provides the algorithm with additional possibilities for obtaining high throughput schedules with low spill code requirements for clustered architectures. We show that the proposed approach outperforms previously proposed techniques and that it is very scalable independently of the number of clusters, the number of communication buses and communication latency. The paper also includes an exploration of some parameters in the design of future clustered VLIW cores.Peer Reviewe

    Mirs: modulo scheduling with integrated register spilling

    No full text
    The overlapping of loop iterations in software pipelining techniques imposes high register requirements. The schedule for a loop is valid if it requires at most the number of registers available in the target architecture. Otherwise its register requirements have to be reduced by spilling registers to memory. Previous proposals for spilling in software pipelined loops require a two-step process. The first step performs the actual instruction scheduling without register constraints. The second step adds (if required) spill code and reschedules the modified loop. The process is repeated until a valid schedule, requiring no more registers than those available, is found. The paper presents MIRS (Modulo scheduling with Integrated Register Spilling), a novel register-constrained modulo scheduler that performs modulo scheduling and register spilling simultaneously in a single step. The algorithm is iterative and uses backtracking to undo previous scheduling decisions whenever resource or dependence conflicts appear. MIRS is compared against a state-of-the-art two-step approach already described in the literature. For this purpose, a workbench composed of a large set of loops from the Perfect Club and a set of processor configurations are used. On the average, for the loops that require spill code a speed-up in the range 14–31% and a reduction of the memory traffic by a factor in the range 0.90–0.72 are achieved.Peer Reviewe

    Software and hardware techniques to optimize register file utilization in VLIW

    No full text
    High-performance microprocessors are currently designed with the purpose of exploiting instruction level parallelism (ILP). The techniques used in their design and the aggressive scheduling techniques used to exploit this ILP tend to increase the register requirements of the loops. This paper reviews hardware and software techniques that alleviate the high register demands of aggressive scheduling heuristics on VLIW cores. From the software point of view, instruction scheduling can stretch lifetimes and reduce the register pressure. If more registers than those available in the architecture are required, some actions (such as the injection of spill code) have to be applied to reduce this pressure, at the expense of some performance degradation. From the hardware point of view, this degradation could be reduced if a high-capacity register file were included without causing a negative impact on the design of the processor (cycle time, area and power dissipation). Novel organizations for the register file based on clustering and hierarchical organization are necessary to meet the technology constraints. This paper proposes the used of a clustered organization and proposes an aggressive instruction scheduling technique that minimizes the negative effect of the limitations imposed by the register file organization.Peer Reviewe

    Register constrained modulo scheduling

    No full text
    Software pipelining is an instruction scheduling technique that exploits the instruction level parallelism (ILP) available in loops by overlapping operations from various successive loop iterations. The main drawback of aggressive software pipelining techniques is their high register requirements. If the requirements exceed the number of registers available in the target architecture, some steps need to be applied to reduce the register pressure (incurring some performance degradation): reduce iteration overlapping or spilling some lifetimes to memory. In the first part, we propose a set of heuristics to improve the spilling process and to better decide between adding spill code or directly decreasing the execution rate of iterations. The experimental evaluation, over a large number of representative loops and for a processor configuration, reports an increase in performance by a factor of 1.29 and a reduction of memory traffic by a factor of 1.36. In the second part, we analyze the use of backtracking and propose a novel approach for simultaneous instruction scheduling and register spilling in modulo scheduling: MIPS (modulo scheduling with integrated register spilling). The experimental evaluation reports an increase in performance by a factor of 1.46 and a reduction of the memory traffic by a factor of 1.66 (or an additional 1.13 and 1.22 with regard to the proposal in the first part). These improvements are achieved at the expense of a reasonable increase in the compilation time.Peer Reviewe
    corecore